3  Good Coding Practices in R

By the end of this section, you should be familiar with some important features of good coding. You’ll be expected to adhere to these during your time on the MSc.

3.1 Why Good Practices Matter

Sometimes, in the rush to create code that works, it’s easy to forget how to write ‘good’ code.

There are three principles that we need to bear in mind when writing our code:

  • Readability: Code is more often read than written, so making it easily understandable is crucial. This is particularly valid when your code is going to be accessed by others.

  • Maintainability: Good practices ensure that your code can be easily maintained and updated by anyone, not just its original author. So (for example) you might want to stick with using R packages that are frequently updated, rather than esoteric ones that might not be updated regularly (or at all).

  • Efficiency: Well-written code can save time and resources during execution. Using programming techniques like loops and functions not only assist in the readability of your code, but can assist in performance and memory management.

3.2 Good Practices in R Programming

There are many good online resources that discuss good practices in programming. For our purposes at this introductory level, some key things to remember are:

Code formatting

  • Consistent Naming Conventions: Use meaningful variable and function names that reflect their purpose. For R, using snake_case (e.g., average_height) is my preferred approach. You will also frequently see camelCase used.

  • Indentation and Spacing: Use consistent indentation (e.g., 2 or 4 spaces) to define blocks of code. This makes your code easier to read.

  • Avoid Deep Nesting: Try to limit the depth of nesting; more than three levels deep can make your code harder to follow.

Commenting and documentation

  • Inline Comments: Use comments to explain “why” something is done, not “what” is done, if the latter isn’t obvious from the code itself.

  • Documentation: Document functions and their arguments, detailing expected inputs, the output, and side effects if any. R’s roxygen2 package can be used to facilitate this.

Efficient coding

  • Vectorisation over Loops: Where possible, use vectorised operations instead of loops. For example, use vapply() instead of for loops for better performance.

  • Pre-allocate Memory: When working with large data structures, pre-allocating memory can lead to significant performance gains.

Modular programming

  • Functions: Break code into reusable functions to avoid repetition and improve clarity.

  • Scoping: Understand and utilise scoping rules to avoid inadvertently modifying global variables.

Here is a complete R script that shows some clear coding, and employs some of the techniques listed above. At this point, don’t worry if you can’t understand everything that’s going on. Try to read through the code and follow what’s happening, and notice how/if the comments and the variable names are helpful.

# 1. Consistent Naming Conventions

# In this code, I use snake_case consistently throughout the code
get_average_height <- function(heights_vector) {
  
# 2. Indentation and Spacing
  
  # Indent code within functions and use spacing around operators
  total_height <- sum(heights_vector) # sum of all heights
  number_of_people <- length(heights_vector) # count of people
  
  # Calculate average height
  average_height <- total_height / number_of_people
  
# 3. Avoiding Deep Nesting
  
  # Instead of deep nested if statements, use early returns or logical operators
  if (number_of_people == 0) {
    return(NA) # Return NA if no people
  }
  
  return(average_height)
}

# 4. Inline Comments

# I can provide 'inline' comments to explain complex logic or important steps
ages_vector <- c(25, 30, 35, 40, 45) # A vector of ages
heights_vector <- c(165, 170, 175, 180, 185) # A vector of corresponding heights

# 5. Vectorisation Instead of Loops
# Instead of using a for-loop, use vectorised operations - this helps performance
average_height <- get_average_height(heights_vector)

# 6. Allocation of Memory

# Pre-allocate memory for vectors or matrices instead of growing them in a loop
n <- length(ages_vector)
bmi_vector <- numeric(n) # Pre-allocate vector for BMI values

# 7. Functions

# Encapsulate reusable logic within functions
calculate_bmi <- function(weight, height) {
  return(weight / (height/100)^2)
}

# Use the function in a vectorized manner
weights_vector <- c(65, 70, 75, 80, 85) # A vector of weights
bmi_vector <- calculate_bmi(weights_vector, heights_vector)

# 8. Scoping

# Understand variable scoping to avoid unintentional side effects
# Variables inside functions do not affect global environment unless explicitly returned
print_bmi <- function(bmi_values) {
  print(bmi_values) # Local variable inside function
}

# Call the print_bmi function
print_bmi(bmi_vector)
[1] 23.87511 24.22145 24.48980 24.69136 24.83565